Improvements and Generalizations of Stochastic Knapsack and Multi-Armed Bandit Approximation Algorithms: Full Version
نویسنده
چکیده
The multi-armed bandit (MAB) problem features the classical tradeoff between exploration and exploitation. The input specifies several stochastic arms which evolve with each pull, and the goal is to maximize the expected reward after a fixed budget of pulls. The celebrated work of Gittins et al. [GGW89] presumes a condition on the arms called the martingale assumption. Recently, A. Gupta et al. obtained an LP-based 1 48 -approximation for the problem with the martingale assumption removed [GKMR11]. We improve the algorithm to a 4 27 -approximation, with simpler analysis. Our algorithm also generalizes to the case of MAB superprocesses with (stochastic) multi-period actions. This generalization captures the framework introduced by Guha and Munagala in [GM07a, GM07b], and yields new results for their budgeted learning problems. Also, we obtain a ( 2 −ε)-approximation for the variant of MAB where preemption (playing an arm, switching to another arm, then coming back to the first arm) is not allowed. This contains the stochastic knapsack problem of Dean, Goemans, and Vondrák [DGV08] with correlated rewards, where we are given a knapsack of fixed size, and a set of jobs each with a joint distribution for its size and reward. The actual size and reward of a job can only be discovered in real-time as it is being scheduled, and the objective is to maximize expected reward before the knapsack size is exhausted. Our ( 2 −ε)-approximation improves the 1 16 and 1 8 approximations of [GKMR11] for correlated stochastic knapsack with cancellation and no cancellation, respectively, providing the first tight algorithm for these problems that matches the integrality gap of 2. We sample probabilities from an exponential-sized dynamic programming solution, whose existence is guaranteed by an LP projection argument. We hope this technique can also be applied to other dynamic programming problems which can be projected down onto a small LP.
منابع مشابه
Improvements and Generalizations of Stochastic Knapsack and Multi-Armed Bandit Algorithms: Extended Abstract
The celebrated multi-armed bandit (MAB) problem, originating from the work of Gittins et al. [GGW89], presumes a condition on the arms called the martingale assumption. Recently, A. Gupta et al. obtained an LP-based 1 48 -approximation for the problem with the martingale assumption removed [GKMR11]. We improve the algorithm to a 4 27 -approximation, with simpler analysis. Our algorithm also gen...
متن کاملAnytime optimal algorithms in stochastic multi-armed bandits
We introduce an anytime algorithm for stochastic multi-armed bandit with optimal distribution free and distribution dependent bounds (for a specific family of parameters). The performances of this algorithm (as well as another one motivated by the conjectured optimal bound) are evaluated empirically. A similar analysis is provided with full information, to serve as a benchmark.
متن کاملTime-Constrained Restless Bandits and the Knapsack Problem for Perishable Items
Motivated by a food promotion problem, we introduce the Knapsack Problem for Perishable Items (KPPI) to address a dynamic problem of optimally filling a knapsack with items that disappear randomly. The KPPI naturally bridges the gap and elucidates the relation between the pspace-hard restless bandit problem and the np-hard knapsack problem. Our main result is a problem decomposition method resu...
متن کاملTime-Constrained Restless Bandits and the Knapsack Problem for Perishable Items (Extended Abstract)
Motivated by a food promotion problem, we introduce the Knapsack Problem for Perishable Items (KPPI) to address a dynamic problem of optimally filling a knapsack with items that disappear randomly. The KPPI naturally bridges the gap and elucidates the relation between the pspace-hard restless bandit problem and the np-hard knapsack problem. Our main result is a problem decomposition method resu...
متن کاملOn Index Policies for Restless Bandit Problems
In this paper, we consider the restless bandit problem, which is one of the most well-studied generalizations of the celebrated stochastic multi-armed bandit problem in decision theory. In its ultimate generality, the restless bandit problem is known to be PSPACE-Hard to approximate to any non-trivial factor, and little progress has been made on this problem despite its significance in modeling...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013